NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers

https://doi.org/10.1109/HPCA61900.2025.00029

Li, Chaojian; Li, Sixu; Jiang, Linrui; Zhang, Jingqun; Lin, Yingyan Celine (March 2025, IEEE)

Free, publicly-accessible full text available March 1, 2026
Fusion-3D: Integrated Acceleration for Instant 3D Reconstruction and Real-Time Rendering

https://doi.org/10.1109/MICRO61859.2024.00016

Li, Sixu; Zhao, Yang; Li, Chaojian; Guo, Bowei; Zhang, Jingqun; Zhu, Wenbo; Ye, Zhifan; Wan, Cheng; Lin, Yingyan Celine (November 2024, IEEE)

Full Text Available
INVITED: Data4AIGChip: An Automated Data Generation and Validation Flow for LLM-assisted Hardware Design

Zhang, Yongan; Fu, Yonggan; Yu, Zhongzhi; Zhao, Kevin; Wan, Cheng; Li, Chaojian; Lin, Yingyan Celine (June 2024, ACM)

Full Text Available
MixRT: Mixed Neural Representations For Real-Time NeRF Rendering

https://doi.org/10.1109/3DV62453.2024.00087

Li, Chaojian; Wu, Bichen; Vajda, Peter; Lin, Yingyan Celine (March 2024, The 11th International Conference on 3D Vision)

Neural Radiance Field (NeRF) has emerged as a leading technique for novel view synthesis, owing to its impressive photorealistic reconstruction and rendering capability. Nevertheless, achieving real-time NeRF rendering in large-scale scenes has presented challenges, often leading to the adoption of either intricate baked mesh representations with a substantial number of triangles or resource-intensive ray marching in baked representations. We challenge these conventions, observing that high-quality geometry, represented by meshes with substantial triangles, is not necessary for achieving photorealistic rendering quality. Consequently, we propose MixRT, a novel NeRF representation that includes a low-quality mesh, a view-dependent displacement map, and a compressed NeRF model. This design effectively harnesses the capabilities of existing graphics hardware, thus enabling real-time NeRF rendering on edge devices. Leveraging a highly-optimized WebGL-based rendering framework, our proposed MixRT attains real-time rendering speeds on edge devices (over 30 FPS at a resolution of 1280 x 720 on a MacBook M1 Pro laptop), better rendering quality (0.2 PSNR higher in indoor scenes of the Unbounded-360 datasets), and a smaller storage size (less than 80% compared to state-of-the-art methods).
more » « less
Full Text Available
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models

https://doi.org/10.1109/ICCAD57390.2023.10323953

Fu, Yonggan; Zhang, Yongan; Yu, Zhongzhi; Li, Sixu; Ye, Zhifan; Li, Chaojian; Wan, Cheng; Lin, Yingyan Celine (October 2023, IEEE)
ERSAM: Neural Architecture Search for Energy-Efficient and Real-Time Social Ambiance Measurement

https://doi.org/10.1109/ICASSP49357.2023.10095360

Li, Chaojian; Chen, Wenwan; Yuan, Jiayi; Lin, Yingyan Celine; Sabharwal, Ashutosh (June 2023, 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW • 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand.
more » « less
Full Text Available
An Investigation on Hardware-Aware Vision Transformer Scaling

https://doi.org/10.1145/3611387

Li, Chaojian; Kim, Kyungmin; Wu, Bichen; Zhang, Peizhao; Zhang, Hang; Dai, Xiaoliang; Vajda, Peter; Lin, Yingyan (August 2023, ACM Transactions on Embedded Computing Systems)

Vision Transformer (ViT) has demonstrated promising performance in various computer vision tasks, and recently attracted a lot of research attention. Many recent works have focused on proposing new architectures to improve ViT and deploying it into real-world applications. However, little effort has been made to analyze and understand ViT’s architecture design space and its implication of hardware-cost on different devices. In this work, by simply scaling ViT’s depth, width, input size, and other basic configurations, we show that a scaled vanilla ViT model without bells and whistles can achieve comparable or superior accuracy-efficiency trade-off than most of the latest ViT variants. Specifically, compared to DeiT-Tiny, our scaled model achieves a\(\uparrow 1.9\% \)higher ImageNet top-1 accuracy under the same FLOPs and a\(\uparrow 3.7\% \)better ImageNet top-1 accuracy under the same latency on an NVIDIA Edge GPU TX2. Motivated by this, we further investigate the extracted scaling strategies from the following two aspects: (1) “can these scaling strategies be transferred across different real hardware devices?”; and (2) “can these scaling strategies be transferred to different ViT variants and tasks?”. For (1), our exploration, based on various devices with different resource budgets, indicates that the transferability effectiveness depends on the underlying device together with its corresponding deployment tool; for (2), we validate the effective transferability of the aforementioned scaling strategies obtained from a vanilla ViT model on top of an image classification task to the PiT model, a strong ViT variant targeting efficiency, as well as object detection and video classification tasks. In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from\(74.6\% \)to\(76.7\% \)(\(\uparrow 2.1\% \)) under the same 0.7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by\(\uparrow 0.7\% \)under a similar throughput on a V100 GPU.
more » « less
Full Text Available
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

https://doi.org/10.1109/HPCA56546.2023.10071081

Dass, Jyotikrishna; Wu, Shang; Shi, Huihong; Li, Chaojian; Ye, Zhifan; Wang, Zhongfeng; Lin, Yingyan (February 2023, 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Full Text Available
Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction

https://doi.org/10.1145/3579371.3589115

Li, Sixu; Li, Chaojian; Zhu, Wenbo; Yu, Boyang; Zhao, Yang; Wan, Cheng; You, Haoran; Shi, Huihong; Lin, Yingyan (June 2023, The The IEEE/ACM International Symposium on Computer Architecture 2023 (ISCA 2023))

Full Text Available
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

https://doi.org/10.1109/HPCA56546.2023.10071027

You, Haoran; Sun, Zhanyi; Shi, Huihong; Yu, Zhongzhi; Zhao, Yang; Zhang, Yongan; Li, Chaojian; Li, Baopu; Lin, Yingyan (February 2023, 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. However, ViTs’ self-attention module is still arguably a major bottleneck, limiting their achievable hardware efficiency and more extensive applications to resource constrained platforms. Meanwhile, existing accelerators dedicated to NLP Transformers are not optimal for ViTs. This is because there is a large difference between ViTs and Transformers for natural language processing (NLP) tasks: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90% even with fixed sparse patterns, without severely hurting the model accuracy (e.g., <=1.5% under 90% pruning ratio); while NLP Transformers need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations. On the hardware level, we develop a dedicated accelerator to simultaneously coordinate the aforementioned enforced denser and sparser workloads for boosted hardware utilization, while integrating on-chip encoder and decoder engines to leverage ViTCoD’s algorithm pipeline for much reduced data movements. Extensive experiments and ablation studies validate that ViTCoD largely reduces the dominant data movement costs, achieving speedups of up to 235.3×, 142.9×, 86.0×, 10.1×, and 6.8× over general computing platforms CPUs, EdgeGPUs, GPUs, and prior-art Transformer accelerators SpAtten and Sanger under an attention sparsity of 90%, respectively. Our code implementation is available at https://github.com/GATECH-EIC/ViTCoD.
more » « less
Full Text Available

« Prev Next »

Search for: All records